Cats & Co: Categorical Time Series Coclustering
نویسندگان
چکیده
We suggest a novel method of clustering and exploratory analysis of temporal event sequences data (also known as categorical time series) based on three-dimensional data grid models. A data set of temporal event sequences can be represented as a data set of three-dimensional points, each point is defined by three variables: a sequence identifier, a time value and an event value. Instantiating data grid models to the 3D-points turns the problem into 3D-coclustering. The sequences are partitioned into clusters, the time variable is discretized into intervals and the events are partitioned into clusters. The cross-product of the univariate partitions forms a multivariate partition of the representation space, i.e., a grid of cells and it also represents a nonparametric estimator of the joint distribution of the sequences, time and events dimensions. Thus, the sequences are grouped together because they have similar joint distribution of time and events, i.e., similar distribution of events along the time dimension. The best data grid is computed using a parameter-free Bayesian model selection approach. We also suggest several criteria for exploiting the resulting grid through agglomerative hierarchies, for interpreting the clusters of sequences and characterizing their components through insightful visualizations. Extensive experiments on both synthetic and real-world data sets demonstrate that data grid models are efficient, effective and discover meaningful underlying patterns of categorical time series data.
منابع مشابه
Feature Extraction over Multiple Representations for Time Series Classification
We suggest a simple yet effective and parameter-free feature construction process for time series classification. Our process is decomposed in three steps: (i) we transform original data into several simple representations; (ii) on each representation, we apply a coclustering method; (iii) we use coclustering results to build new features for time series. It results in a new transactional (i.e....
متن کاملSymbolic Representation of Time Series: A Hierarchical Coclustering Formalization
The choice of an appropriate representation remains crucial for mining time series, particularly to reach a good trade-o between the dimensionality reduction and the stored information. Symbolic representations constitute a simple way of reducing the dimensionality by turning time series into sequences of symbols. SAXO is a data-driven symbolic representation of time series which encodes typica...
متن کاملBayesian coclustering of Anopheles gene expression time series: study of immune defense response to multiple experimental challenges.
We present a method for Bayesian model-based hierarchical coclustering of gene expression data and use it to study the temporal transcription responses of an Anopheles gambiae cell line upon challenge with multiple microbial elicitors. The method fits statistical regression models to the gene expression time series for each experiment and performs coclustering on the genes by optimizing a joint...
متن کاملEvidence for representations of perceptually similar natural categories by 3-month-old and 4-month-old infants.
The paired-preference procedure was used in a series of experiments to explore the abilities of infants aged 3 and 4 months to categorize photographic exemplars from natural (adult-defined) basic-level categories. The question of whether the categorical representations that were evidenced excluded members of a related, perceptually similar category was also investigated. Experiments 1-3 reveale...
متن کاملExclusive Partition in FCM-type Co-clustering and Its Application to Collaborative Filtering
The task of collaborative filtering has close relation to coclustering, in which personalized recommendation is achieved by connecting users with items to be preferred. FCM-type coclustering extracts user-item co-clusters, in which users are assigned to clusters in an exclusive manner while item partitions are not necessarily exclusive and each item can be shared (rejected) by multiple (all) cl...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1505.01300 شماره
صفحات -
تاریخ انتشار 2015